Automatic Generation of the HPC Challenge's Global FFT Benchmark for BlueGene/P

نویسندگان

  • Franz Franchetti
  • Yevgen Voronenko
  • Gheorghe S. Almasi
چکیده

We present the automatic synthesis of the HPC Challenge’s Global FFT, a large 1D FFT across a whole supercomputer system. We extend the Spiral system to synthesize specialized single-node FFT libraries that combine a data layout transformation with the actual on-node FFT computation to improve the network performance through enabling all-to-all collectives. We run our optimized Global FFT benchmark on up to 128k cores (32 racks) of ANL’s BlueGene/P “Intrepid” and achieved 6.4 Tflop/s, outperforming ANL’s 2008 HPC Challenge Class I Global FFT run (5 Tflop/s). Our code was part of IBM’s winning 2010 HPC Challenge Class II submission. Further, we show first singlethread results on BlueGene/Q.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The BlueGene/L Supercomputer and Quantum ChromoDynamics

We describe our methods for performing quantum chromodynamics (QCD) simulations that sustain up to 20% of the peak performance on BlueGene supercomputers. We present our methods, scaling properties, and first cutting edge results relevant to QCD. We show how this enables unprecedented computational scale that brings lattice QCD to the next generation of calculations. We present our QCD simulati...

متن کامل

Vectorization Techniques for BlueGene/L’s Double FPU

This paper presents vectorization techniques tailored to meet the specifics of the twoway single-instruction multiple-data (SIMD) double-precision floating-point unit, which is a core element of the node ASICs of IBM's 360 Tflop/s supercomputer BlueGene/L. The paper focuses on the general-purpose basic-block vectorization methods provided by the Vienna MAP vectorizer. In addition, the paper int...

متن کامل

Evaluation of the HPC Challenge Benchmarks in Virtualized Environments

This paper evaluates the performance of the HPC Challenge benchmarks in several virtual environments, including VMware, KVM and VirtualBox. The HPC Challenge benchmarks consist of a suite of tests that examine the performance of HPC architectures using kernels with memory access patterns more challenging than those of the High Performance Linpack (HPL) benchmark used in the TOP500 list. The tes...

متن کامل

Automatically Tuned FFTs for BlueGene/L's Double FPU

IBM is currently developing the new line of BlueGene/L supercomputers. The top-of-the-line installation is planned to be a 65,536 processors system featuring a peak performance of 360 Tflop/s. This system is supposed to lead the Top 500 list when being installed in 2005 at the Lawrence Livermore National Laboratory. This paper presents one of the first numerical kernels run on a prototype BlueG...

متن کامل

Portable Implementation of Real-Time Signal Processing Benchmarks on HPC Platforms

For the evaluation of HPC systems for real time signal pro cessing real time benchmarks have recently been proposed by the US DoD signal processing and HPC communities For the implementation of real time benchmarks we have developed e cient communication algo rithms forM to N K block cyclic communication Using our algorithms we have implemented real time D FFT and Corner Turn benchmarks that ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012